A  New  Parallel  Optimization  Algorithm 
for  Parameter  Identification 
in  Ordinary  Differential  Equations1 

J.E.  Dennis,  Jr. 

Karen  A.  Williamson 

September,  1988 
TR88-12 


1  Department  of  Mathematical  Sciences,  Rice  University,  P.  O.  Box  1892,  Houston,  TX 
77251.  This  research  was  sponsored  by  AFOSR  85-0243,  DOE  DE-FG05-86ER25017,  SDIO/IST/ARO 
DAAG-03-86-K-0113,  and  NSF  grant  CCR-86-19893.  This  paper  was  prepared  by  invitation 
for  delivery  at  The  27th  IEEE  Conference  on  Decision  and  Control,  December  1988. 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

SEP  1988  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-1988  to  00-00-1988 

4.  TITLE  AND  SUBTITLE 

A  New  Parallel  Optimization  Algorithm  for  Parameter  Identification  in 
Ordinary  Differential  Equations 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Computational  and  Applied  Mathematics  Department  ,Rice 

University, 6100  Main  Street  MS  134, Houston, TX, 77005-1892 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

17 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Abstract 


Often  in  mathematical  modeling,  it  is  necessary  to  estimate  nu¬ 
merical  values  for  parameters  occurring  in  a  system  of  ordinary  dif¬ 
ferential  equations  from  experimental  measurements  of  the  solution 
trajectories.  We  will  discuss  some  of  the  difficulties  involved  in  the 
solution  of  this  problem,  and  we  will  describe  a  new  parallel  quasi- 
Newton  algorithm  for  finding  values  of  the  parameters  so  that  the 
numerical  solution  of  the  state  equation  best  fits  the  observed  data  in 
the  weighted  least  squares  sense. 


1  Introduction 


The  ability  to  numerically  estimate  parameters  that  appear  in  differential 
equation  models  of  dynamic  processes  is  crucial  to  the  research  effort  in  a 
variety  of  experimental  science  and  engineering  areas.  Consider,  for  example, 
the  process  of  modeling  in  chemical  kinetics.  One  of  the  important  goals  in 
this  field  is  to  determine  which  elementary  reactions  constitute  the  mecha¬ 
nism  of  a  complex  reaction.  First,  a  model  consisting  of  a  system  of  elemen¬ 
tary  chemical  reactions  is  formulated  to  describe  the  proposed  mechanism  of 
the  complex  reaction  system.  The  proposed  chemical  model  determines  the 
dependence  of  the  reaction  rate  on  the  concentrations  of  the  reactants  and 
products.  In  this  way,  the  chemical  model  is  converted  into  a  mathematical 
model  in  the  form  of  a  system  of  rate  laws.  This  mathematical  model  then 
consists  of  a  system  of  ordinary  differential  equations  which  describes  the 
rate  of  change  of  the  concentrations  of  the  individual  chemical  species  over 
time.  In  this  model  are  parameters  known  as  rate  constants  which  relate  the 
concentrations  of  the  species  to  the  rate  of  the  reaction.  Then,  given  some 
experimental  measurements  of  the  concentrations  of  the  species  at  various 
times,  we  would  like  estimates  of  the  rate  constants.  The  proposed  reac¬ 
tion  mechanism  can  be  evaluated  by  considering  how  well  the  solution  of 
the  dynamical  system  with  the  final  estimates  of  the  rate  constants  fits  the 
experimental  measurements. 

For  example,  the  following  chemical  reaction  system; 

IrCl26~  +  NO  —  IrClt  +  HN02  +  H+ 

2  N02  ^  N03  +  HN02  +  H+ 

2  HN02  ^  NO  +  NO2, 

with  rate  constants  k4,  fc2,  k3 ,  k4,  k5,  and  ke,  has  been  under  investigation  by 
Ram  and  Stanbury  [13].  The  model  for  this  reaction  mechanism  is  a  fifth- 
order  system  of  ordinary  differential  equations.  Given  some  measurements 
of  the  absorbance  of  IrCl\~  at  various  sample  times,  the  objective  is  to 
estimate  values  for  the  rate  constants  of  the  reactions,  i.e.  k4, . . . ,  ke.  Then, 
the  proposed  reaction  mechanism  can  be  evaluated  by  considering  how  well 
the  integrated  solution  of  the  model  with  the  final  value  of  the  parameter 
vector  fits  the  data. 

Important  problems  of  this  type  arise  in  diverse  areas  of  science  and  en¬ 
gineering.  In  addition  to  chemical  kinetics,  other  examples  of  parameter 
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estimation  problems  can  be  found  in  biology,  biochemistry,  and  robotics.  In 
this  paper  we  will  describe  some  standard  approaches  to  the  solution  of  the 
parameter  identification  problem  in  systems  of  ordinary  differential  equa¬ 
tions.  Then,  we  will  describe  a  new  algorithm  that  is  based  on  the  Celis, 
Dennis,  and  Tapia  trust  region  algorithm  [7],  [8]  for  equality  constrained  op¬ 
timization  problems.  This  new  algorithm  should  be  both  more  efficient  and 
more  stable  than  standard  solution  techniques,  and  it  also  provides  a  flexi¬ 
ble  framework  for  introducing  parallelism  into  the  parameter  identification 
problem. 


2  Problem  Formulation 


In  order  to  discuss  the  new  algorithm,  we  will  first  establish  some  nota¬ 
tion.  Let  IRn  denote  the  real  n-dimensional  Euclidean  space.  Let  p  — 
(pi,p2,  •  •  •  ?Pwp)T  G  IRNp  be  the  vector  of  parameters  to  be  estimated,  y  = 
(yi,  y2, . . . ,  VNy)T  G  JRNy  be  a  state  vector,  and  t  G  IR  be  the  independent 
variable,  usually  interpreted  as  time.  We  assume  that  the  system  we  are 
studying  is  modeled  by  a  parameterized  first-order  system  of  ordinary  differ¬ 
ential  equations 


dy 

dt 


F(t,y;p);  with  initial  conditions  y(t0)  =  yo, 


(1) 


where  F  :  IRx  IRNy  x  IRNp  — >  lRNy  satisfies  some  continuity  conditions.  Let 
y(t;p)  E  lRNy  denote  the  solution  of  (1)  at  time  t  with  the  parameter  vector 
p,  and  let  p )  denote  its  i-th  component.  Then,  given  a  set  of  data  points 


(tdatap-,  ydatatj);  i  =  1, . . . ,  Ny;  j  =  0, . . . ,  Nd(i),  (2) 

where  ydata^  is  an  approximation  to  ?/;(tdatatj; p„)  for  some  unknown  pa¬ 
rameter  vector  p*  ,  the  parameter  identification  problem  is  to  determine  an 
estimate  of  p*  ,  such  that  the  solution  of  (1)  with  this  parameter  vector  “best 
fits”  the  given  data.  In  this  work,  we  will  consider  “best  fit”  to  mean  that 
p*  minimizes  the  sum  of  the  squares  of  the  deviations  between  the  numerical 
solution  of  (1)  and  the  data  points,  ydata^-,  at  the  time  points  t d a t a y ;  i.e.  p* 
solves  the  following  optimization  problem: 

1  Ny  Nd(i) 

minimize  /(p)  =  (y.(tdatab;p)  -  ydatap)2-  (3) 

1  i= 1  j= o 
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Notice  that  we  have  included  the  components  corresponding  to  initial  values 
at  t0  in  the  objective  function,  and  we  have  included  initial  values,  y(to), 
in  the  statement  of  the  model,  given  in  (1).  Ideally,  we  would  have  exact 
initial  values  for  all  of  the  components  of  y  in  the  form  of  the  data  points 
corresponding  to  to,  and  thus, 

yi(t0)  =  ydata-0;  i  =  1, . . . ,  Ny  . 


However,  it  is  clear  that  there  will  be  cases  in  which  some  or  all  of  the  initial 
values  are  known  only  approximately.  In  these  situations,  some  or  all  of 
the  components  of  y(f0)  should  be  treated  as  parameters.  We  use  a  vector, 
ivpar  G  IRNy,  to  keep  track  of  which  components  of  the  initial  values  are 
being  included  as  parameters,  i.e.: 

{0  if  yi(to )  is  not  included  as  a  parameter  ,  s 

1  if  yi(t0 )  is  included  as  a  parameter,  ' 


and  let  N{v  =  Ya=i  ivpar;  denote  the  number  of  initial  values  that  are  included 
as  parameters.  The  dimension  of  the  parameter  vector,  still  denoted  by  Np, 
now  includes  both  the  number  of  parameters  contained  in  F(t,y,p)  and  the 
number  of  initial  values  that  are  included  as  parameters.  Therefore,  we  will 
use 


j  ydatai0  if  ivpar,  =  0 
\  PNp-Ntv+i  if  ivpar-  =  1 


(5) 


as  the  initial  conditions  for  the  initial  value  problem  given  in  (1).  We  will 
define  the  residual  vector,  R(p )  =  (Ri,  R-2,  •  •  • ,  Rnr)T  £  IRNr ,  such  that 
for  i  =  1, . . . ,  Ny  ,  j  =  1, . . . ,  Nd(i)  +  ivpar;  ,  and  k  =  j  +  £m=i  Nd{m)  + 

ivparm  , 


Rk{p)  =  yi(tdataii;p)  -  ydata8j  ,  (6) 

and  the  dimension  of  the  residual  is  given  by 

Ny 

NR  =  Niv  +  '£Nd(i). 

i= 1 


Thus,  the  objective  function  f(p)  given  by  (3)  is  equivalent  to 

m  =  \mTR(p)  =  jiiKMin,  p) 
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and  it  has  the  special  structure  of  the  nonlinear  least  squares  objective  func¬ 
tion.  We  should  point  out  that  in  some  problems  there  is  motivation  to  use 
a  weighted  least  squares  objective  function,  but  for  simplicity,  we  will  not 
include  the  weighting  factors  in  this  discussion. 

At  this  point,  some  remarks  about  the  form  we  have  assumed  for  the 
problem  are  appropriate.  We  have  chosen  this  complex  structure  for  the 
data  set  because  its  flexibility  allows  us  to  consider  some  real  applications 
that  would  otherwise  be  excluded.  The  general  data  structure  given  in  (2) 
allows  us  to  include  cases  in  which  no  data  points  are  available  for  some  of  the 
components  of  y  and  cases  in  which  the  data  points  for  different  components 
of  y  do  not  occur  at  the  same  time  points.  To  accomplish  this,  Nd(i)  denotes 
the  number  of  data  points  available  for  the  i-th  component  of  the  solution 
vector  and  tdata,j  denotes  the  time  corresponding  to  the  j-th  data  point  of 
the  i-th  component  of  y,  i.e.  ydata-. 

In  addition,  we  have  assumed  that  the  model  is  a  first-order  normal  sys¬ 
tem  of  differential  equations.  Since  any  general  n-th  order  differential  equa¬ 
tion  which  is  resolved  with  respect  to  the  n-th  derivative  can  be  transformed 
into  an  equivalent  first-order  system,  this  assumption  is  not  restrictive. 

3  Initial  Value  Approaches 

The  most  straightforward  approach  to  solving  (3)  is  to  use  a  quasi-Newton 
method  from  unconstrained  optimization  for  the  minimization  and  an  initial 
value  method  for  the  required  numerical  integration.  In  order  to  fully  exploit 
the  structure  of  the  problem,  we  can  use  a  nonlinear  least  squares  algorithm 
such  as  an  augmented  Gauss-Newton  trust  region  method  [10]  to  minimize 
the  objective  function  (7).  Let  J(p )  E  1RNrxNp  denote  the  Jacobian  matrix, 
i.  e.  the  first  derivative  matrix  of  R(p),  where  J(p)ij  =  ^(pI,  Then,  the  first 
derivative  of  f(p)  —  \R(p)T  R{p)  is 

Nr 

v/(p)  =  £/*(?)  •  VRi(p)  =  J(p)tR(p). 

2=1 

Similarly,  the  second  derivative  of  f(p)  is 

Nr 

V2/(p)  =  E(Vf?t(p)  •  VRi{p)T  +  Ri(p)  ■  V2Ri(p)) 

t=i 
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v7(ri  =  J(p)tj(p)  +  s(p) 

Nr 

where  S  ( p )  =  •  V2,%(p) 

1  =  1 

is  the  second-order  information  in  the  Hessian  of  f(p)-  Then,  at  each  itera¬ 
tion,  a  step  s  is  chosen  to  solve  the  following  trust  region  subproblem: 

minimize  mc(pc  +  s)  -  \R(pc)T  R(pc)  +  (J  {pc)T  R(pc))T  s 

+  sT  (J(pc)T  J  (pc)  +  S(pc))s 

subject  to  || s || 2  <  8C  , 

where  mc  is  the  local  quadratic  model  of  /  at  the  current  parameter  vector 
pc.  The  trust  region  radius  8C  provides  a  region  in  which  we  can  “trust”  the 
local  model  mc. 

At  each  iteration  we  must  solve  the  initial  value  problem  (1)  in  order 
to  compute  R(pc)-  We  can  use  finite  differences  to  compute  the  required 
Jacobian  of  R ,  or  we  can  use  the  sensitivity  equations.  In  the  sensitivity 
equations  approach,  the  following  system  of  equations: 


d  ,  % . 

dt  dpj 


m  dvk, 

dpj  dyk  dpj  ’ 


l  —  1  ,  .  .  .  ,  Ny ,  J  1  ,  .  .  .  ,  Np 


with  appropriate  initial  conditions  is  numerically  integrated  to  obtain  an 
approximation  to  the  Jacobian.  Notice  that  this  approach  requires  the  si¬ 
multaneous  integration  of  the  initial  value  problem  (1),  and  it  requires  an¬ 
alytical  expressions  for  the  derivatives  and  Finally,  we  use  a  secant 
approximation  to  the  second-order  portion  of  the  Hessian  [10]. 

Numerical  testing  indicates  that  these  nonlinear  least  squares  algorithms 
with  the  Jacobian  calculated  using  finite  differences  or  using  the  sensitivity 
equations  work  well  on  a  wide  variety  of  test  problems.  However,  these 
algorithms  have  room  for  improvement  in  two  major  areas.  Our  experience 
indicates  that  a  good  initial  guess  for  the  parameter  vector  is  important, 
but  not  because  of  the  optimization  method.  When  the  initial  guess  is  far 
from  the  solution,  the  difficulty  is  that  the  algorithm  may  reach  a  parameter 
vector  for  which  the  resulting  initial  value  problem  is  unstable  and  where  the 
residual  can  not  be  calculated. 

In  addition,  the  calculation  of  the  residual  and  of  the  columns  of  the  Ja¬ 
cobian  are  computationally  intensive.  In  fact,  experience  indicates  that  typ¬ 
ically  ninety  percent  or  more  of  the  cpu  time  is  spent  on  these  calculations. 
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One  way  to  handle  this  difficulty  is  through  the  use  of  parallel  computing 
to  exploit  the  independence  of  the  calculation  of  the  residual  and  each  of 
the  columns  of  the  Jacobian.  This  parallelism  occurs  very  naturally  in  the 
finite  difference  nonlinear  least  squares  method.  Similarly,  in  the  sensitivity 
equations  approach,  each  column  of  the  Jacobian  requires  the  numerical  in¬ 
tegration  of  an  independent  block  of  equations.  However,  the  parallelism  in 
this  method  is  complicated  by  the  fact  that  y'  —  F(t ,  y;  p)  must  be  integrated 
along  with  each  block  of  equations  in  order  to  evaluate  the  partial  derivative 
matrices.  In  addition,  further  speed-up  will  be  obtained  by  exploiting  the 
parallelism  in  the  linear  algebra  needed  for  the  calculation  of  the  optimization 
step,  [5],  [9]. 

4  Nonlinear  Programming  Approaches 

In  order  to  improve  the  stability  and  efficiency  of  our  algorithms,  we  want  to 
investigate  techniques  for  calculating  the  residual  based  on  boundary  value 
methods  [11],  [14].  Specifically,  we  are  considering  a  multiple  shooting  tech¬ 
nique  in  place  of  the  initial  value  methods  for  the  numerical  integration  of  the 
system  given  in  (1).  Multiple  shooting  can  be  considered  a  form  of  domain  de¬ 
composition  for  ordinary  differential  equations.  Thus,  in  addition  to  adding 
stability,  multiple  shooting  provides  a  technique  for  introducing  additional 
parallelism  into  the  parameter  identification  problem.  Furthermore,  multiple 
shooting  leads  to  a  constrained  formulation  of  the  optimization  problem  that 
will  allow  us  to  use  methods  which  we  believe  will  be  more  efficient  than  the 
initial  value  approaches  presented  in  the  previous  section. 

Let  tj  =  max{tdata!j;  i  =  1  ,...,Ny\j  =  0, . . . ,  Nd(i)}.  In  the  multiple 
shooting  method,  shooting  parameters,  Zk(p )  6  lRNy;  k  =  1  are 

introduced  into  the  problem  at  time  points  Tk ;  k  —  1, . . . ,  M,  satisfying 


t0  <  Tx  <  T2  <  T3  . . .  Tm  <  tf  ■ 


The  shooting  parameters,  zy,  k  =  1 ,  represent  the  solution  of  the 

initial  value  problem  (1)  at  the  times  Tk\k  =  1 , . . . ,  M.  Let  y(f,p;Tfc,  Zk) 
denote  the  solution  of 

y'  =  F(t,y,p );  with  initial  values  y(Tk\p )  =  Zk  (8) 
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on  the  interval  [Tk,Tk+i].  Then,  the  problem  consists  of  determining  the 
vectors  zk ;  k  =  1, . . . ,  M ;  in  such  a  way  that  the  function 


y(t\p) 


y(t,p-,t0,y(t0)) 
•  y(t,p-,Tk,zk ) 

k  y(t,p;TM,zM ) 


for  t  6  [<0,7i) 

for  t  G  [Tk,Tk+1)]  k  =  1,. . .  ,Af  -  1 
for 


is  continuous  over  the  entire  interval,  and  thus  a  solution  of  the  initial  value 
problem  (1).  This  yields  continuity  conditions  at  each  Tk\  k  =  1, . . . ,  M  for 
the  unknowns  zk ;  k  =  1  These  conditions  constitute  a  nonlinear 

system  of  equations  of  dimension  M  ■  Ny  of  the  form: 


y(Ti,p;t0,y(t0))  -  zi 
y{T2,p-,Tuzx)  -  z2 

.  v{Tmi  Pw,  Tm-i,zm-i)  -  zm  . 

Thus,  the  numerical  integration  of  the  system  of  differential  equations  (1) 
now  requires  the  solution  of  the  nonlinear  system  given  by  (9). 

The  most  obvious  way  to  add  the  stability  of  the  multiple  shooting  ap¬ 
proach  to  the  algorithms  described  in  the  previous  section  is  to  calculate 
the  residual  using  this  multiple  shooting  technique  instead  of  an  initial  value 
method.  Unfortunately,  each  residual  calculation  now  has  the  additional  ex¬ 
pense  of  the  iterative  solution  of  the  nonlinear  system  of  equations  (9).  Even 
though  we  could  use  a  structured  variation  of  Broyden’s  method  to  calcu¬ 
late  the  Jacobian  of  h  at  each  step  of  the  iterative  solution  of  this  nonlinear 
system  of  equations,  we  must  numerically  integrate  each  of  the  initial  value 
problems  of  the  form  (8)  in  order  to  evaluate  h(pc,  z(pc)).  Therefore,  it  is 
clear  that  merely  substituting  multiple  shooting  for  the  initial  value  meth¬ 
ods  in  the  algorithms  discussed  in  Section  3  will  lead  to  more  stable,  but 
slower  algorithms.  However,  we  will  now  show  that  multiple  shooting  leads 
to  a  different  formulation  of  the  optimization  problem  which  will  allow  us  to 
develop  a  more  efficient  and  stable  algorithm. 

Recall  that  we  are  interested  in  solving  the  following  optimization  prob¬ 
lem: 

minimize /(p)  =  -R(p)TR(p), 

and  that  calculating  the  residual  for  the  current  parameter  vector  using  mul¬ 
tiple  shooting  is  equivalent  to  solving  the  nonlinear  system,  h(pc,  z(pc ))  =  0. 


h(p,z)  = 
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Following  Bock,  [1],  [2],  we  consider  instead  the  following  constrained  param¬ 
eter  identification  problem: 

Problem  CPID: 

minimize  f{p,z)  =  -R(p,  z)T R(p,  z) 
subject  to  h(p,z)  =  0, 

where  p  and  2  are  regarded  as  independent  variables.  In  this  context,  the 
initial  value  approaches  described  in  Section  3  can  be  considered  as  meth¬ 
ods  which  require  each  parameter  iterate  to  be  feasible,  i.e.  y(t\p )  solves 
the  initial  value  problem  (1)  at  each  iteration,  by  regarding  z  as  dependent 
on  p.  However,  this  constrained  formulation  will  allow  us  to  consider  algo¬ 
rithms  designed  for  equality  constrained  optimization:  algorithms  which  will 
allow  infeasible  iterates.  Nonlinear  programming  experience  has  shown  that 
algorithms  which  allow  infeasible  iterates  are  generally  more  efficient  than 
algorithms  which  require  each  iterate  to  be  feasible. 

4.1  SQP  Formulation 

One  of  the  most  popular  methods  for  solving  the  equality  constrained  opti¬ 
mization  problem  is  the  successive  quadratic  programming  (SQP)  method. 
At  each  iteration,  the  SQP  method  solves  a  quadratic  programming  problem 
of  the  form 

Problem  QP:  (10) 

minimize  V  f(p,z)T s  +  -sTBs 

£ 

subject  to  h(p,  z )  +  V/i(p,  z)Ts  =  0 

where  B  is  an  approximation  to  the  Hessian  of  the  Lagrangian  l(p,z,\)  for 
the  step  sqp  =  (A p,  A z)T  and  the  Lagrange  multipliers  \®p .  The  Lagrangian 
function  associated  with  Problem  CPID  is  the  function 

Kp ,  A)  =  /(p,  z)  +  A Th(p,  z)  (11) 

where  A  =  (Ai,A2,...,A np-m)T  are  the  Lagrange  multipliers.  We  say  that 
the  point  ( p+,z+)T  =  ( pc,zc)T  +  (Ap,Az)T  is  linearly  feasible  if  it  satisfies 
the  linearized  constraint 

h(pc,zc)  +  V/i(pc,  zc)T  s  =  0 
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where  s  =  (Ap,  A z)T . 

The  fast  local  convergence  properties  of  the  SQP  method  have  been  fairly 
well  established,  but  the  issue  of  a  satisfactory  globalization  strategy  still 
remains  open.  A  number  of  line  search  techniques  have  been  proposed  by 
various  authors,  but  none  of  them  have  proven  to  be  particularly  successful. 
One  of  the  reasons  for  this  is  that  there  is  no  natural  descent  function  for 
the  constrained  optimization  problem  due  to  the  conflict  between  decreasing 
the  objective  function  and  moving  towards  feasibility. 

H.  G.  Bock  and  his  coworkers  have  implemented  a  Han-Powell  type  SQP 
method  for  the  parameter  identification  problem  in  their  code  PARFIT  [1], 
[2],  [3].  Unfortunately,  they  experience  all  of  the  difficulties  that  make  SQP 
implementations  hard.  Older  implementations  of  SQP  encounter  difficulties 
in  handling  rank  degeneracy  in  the  Jacobian  of  the  constraints,  lack  of  second- 
order  sufficiency  in  the  QP  subproblem,  and  the  situation  when  the  solution 
to  the  QP  subproblem  is  not  a  good  descent  direction  for  the  standard  merit 
functions.  We  believe  that  our  new  algorithm  will  overcome  these  difficulties. 

4.2  CDT  Formulation 

In  this  section  we  will  discuss  our  new  algorithm  which  is  based  on  a  trust 
region  globalization  strategy  for  equality  constrained  optimization  that  has 
been  developed  by  Celis,  Dennis,  and  Tapia,  [7],  [8].  The  most  obvious  trust 
region  approach  is  to  simply  add  a  trust  region  constraint,  i.e.  a  constraint 
which  limits  the  size  of  the  step,  to  Problem  QP.  Unfortunately,  the  resulting 
trust  region  subproblem  may  not  have  a  solution  since  there  may  not  be  a 
linearly  feasible  point  inside  the  trust  region.  Therefore,  we  want  to  relax  the 
constraint  h(pc,zc )  +  Vh(pc,  zc)T s  —  0  so  that  the  trust  region  subproblem 
has  a  solution  without  sacrificing  convergence  to  a  feasible  point. 

As  motivation  for  how  much  linear  feasibility  we  will  require  at  each  it¬ 
eration,  consider  solving  h(p,z )  =  0  using  an  unconstrained  trust  region  ap¬ 
proach.  At  each  iteration  we  would  minimize  a  quadratic  model  of  ||| h(p,  z) ||| 
subject  to  a  trust  region  constraint  on  the  length  of  the  step,  i.  e.  ,  ||s||2  <  Sc. 
Let  scp  =  — acV h{pc,  zc)T h(pc,  zc)  denote  the  step  to  the  Cauchy  point,  i.e., 
the  minimizer  in  the  trust  region  {s  :  ||s||2  <  <5>c)  of  \\h.(pc,  zc)  +  V h(pc,  zc)T s\\l 
along  the  direction  of  its  negative  gradient.  If  the  algorithm  took  the  step 
to  the  Cauchy  point  scp  on  the  model  || h(pc,zc)  +  Vh(pc:zc)Ts ||| ,  then  un¬ 
der  mild  assumptions,  first  order  convergence  can  be  established,  [6].  With 
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this  motivation,  Celis,  Dennis,  and  Tapia  consider  the  following  trust  region 
subproblem: 

Problem  NDTR: 

minimize  V/(pc,  zc)Ts  +  -sTBs 

z 

subject  to  || h(pc,zc)  +  Vh(pc,  zc)T s\\2  <  0C 

IMk  <  No¬ 
where  6C  is  chosen  to  be  \\h(pc,zc)  -f-  Vh(pc, zc)T.s||2  for  some  s  that  satisfies 
p||2  <  6C.  For  example,  s  could  be  taken  to  be  the  step  to  the  Cauchy  point, 
Sep ■  Unfortunately,  in  the  case  where  both  of  the  quadratic  constraints  are 
binding,  Problem  NDTR  becomes  difficult  and  expensive  to  solve. 

Motivated  by  the  work  of  Byrd,  Schnabel  and  Shultz  [4]  on  trust  region 
methods  for  unconstrained  optimization,  Celis,  Dennis  and  Tapia  [8]  have 
developed  a  more  convenient  trust  region  subproblem  by  restricting  the  trust 
region  subproblem  to  a  two-dimensional  subspace  spanned  by  sqp  and  s  as 
follows: 

Problem  2DTR: 

minimize  V  f(pc,  zc)T s  +  -sTBs 

z 

subject  to  ||  h(pc,zc)  +  h(pc,  zc)T s ||2  <  9C 

IMk  <  <5C 

s  G  span {sqp,s} 

where  sqp  is  the  solution  to  Problem  QP.  Numerical  testing  on  standard 
nonlinear  programming  problems  indicates  that  this  two-dimensional  trust 
region  algorithm  is  generally  more  efficient  than  other  SQP  implementations. 
Furthermore,  we  will  be  assured  of  global  convergence,  under  reasonable 
conditions,  by  the  results  given  in  the  thesis  of  El-Alem  [12]. 

Thus,  our  new  algorithm  retains  the  good  stability  properties  of  the  mul¬ 
tiple  shooting  approach,  and  in  addition,  we  believe  that  it  will  prove  to 
be  more  efficient  and  economical  than  the  algorithms  based  on  initial  value 
methods.  First,  our  new  algorithm  should  be  more  efficient  than  the  ini¬ 
tial  value  approaches  since  it  will  allow  infeasible  iterates.  The  initial  value 
approaches  require  each  iterate  to  be  feasible.  In  addition,  we  expect  our  al¬ 
gorithm  to  be  more  efficient  than  other  SQP  implementations  because  we  do 
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not  require  the  iterates  to  be  linearly  feasible.  This  feature  of  the  algorithm 
also  overcomes  the  difficulty  when  there  is  no  linearly  feasible  point. 

In  addition  to  its  stability  and  efficiency,  our  new  algorithm  provides  a 
technique  for  introducing  additional  parallelism  into  the  solution  of  the  pa¬ 
rameter  identification  problem.  This  is  due  to  the  crucial  observation  that 
the  multiple  shooting  approach  can  be  viewed  as  a  powerful  domain  decom¬ 
position  technique  for  this  problem.  Furthermore,  our  new  algorithm  retains 
the  parallelism  inherent  in  the  initial  value  approaches  since  the  residual  and 
the  columns  of  the  Jacobian  of  the  residual  are  still  independent  calculations. 
The  Jacobian  can  be  calculated  using  either  finite  differences  or  the  sensi¬ 
tivity  equations,  and  the  obvious  parallelism  in  both  of  these  methods  has 
been  discussed  previously.  As  a  result  of  the  multiple  shooting  approach, 
the  numerical  integration  needed  for  the  computation  of  h(p ,  z)  and  the  nec¬ 
essary  Jacobian  matrices  has  been  divided  into  many  shorter,  independent 
numerical  integrations  on  the  subintervals  [T,-,Tl+i]. 

Thus,  this  domain  decomposition  technique  is  a  powerful  tool  that  will 
allow  us  to  overcome  several  difficulties.  First,  we  will  choose  the  shooting 
points,  i.e.  T;,  that  are  necessary  to  obtain  stability  of  the  initial  value  prob¬ 
lem.  In  addition,  this  multiple  shooting  approach  provides  us  with  a  flexible 
framework  for  load  balancing,  for  we  can  now  choose  additional  shooting 
points  to  further  divide  the  time  interval  into  smaller  intervals  to  make  ef¬ 
fective  use  of  the  number  of  processors  that  are  available.  Finally,  this  al¬ 
gorithm  also  has  opportunities  for  further  speed-up  within  the  trust  region 
subproblem  since  this  subproblem  can  be  divided  into  several  independent 
computations. 

5  Conclusion 

Through  the  use  of  multiple  shooting,  we  have  developed  a  new  algorithm 
for  the  parameter  identification  problem  that  promises  to  be  more  efficient 
than  the  initial  value  approach  and  other  SQP  implementations.  In  addi¬ 
tion,  multiple  shooting  provides  a  domain  decomposition  technique  which 
allows  us  to  judiciously  introduce  parallelism  into  this  optimization  problem. 
Furthermore,  this  approach  already  puts  us  in  the  proper  framework  to  treat 
important  practical  side  constraints  on  the  parameter  vector,  both  direct  con¬ 
straints,  like  p  >  0,  or  more  interesting  state  constraints  like  G(t,y-p )  >  0. 
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This  paper  provides  an  overview  of  a  new  parallel  optimization  algorithm 
for  the  solution  of  the  parameter  estimation  problem  in  systems  of  ordinary 
differential  equations.  Further  details  may  be  found  in  [15]. 

The  authors  wish  to  thank  Richard  Tapia  and  David  Dobson  for  their 
helpful  comments. 
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